We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.
translated by 谷歌翻译
我们提出BYOL-QUENPLORE,这是一种在视觉复杂环境中进行好奇心驱动的探索的概念上简单但一般的方法。Byol-explore通过优化潜在空间中的单个预测损失而没有其他辅助目标,从而学习了世界代表,世界动态和探索政策。我们表明,BYOL探索在DM-HARD-8中有效,DM-HARD-8是一种具有挑战性的部分可观察的连续操作硬探索基准,具有视觉富含3-D环境。在这个基准上,我们完全通过使用Byol-explore的内在奖励来纯粹通过增强外部奖励来解决大多数任务,而先前的工作只能通过人类的示威来脱颖而出。作为Byol-explore的一般性的进一步证据,我们表明它在Atari的十个最难的探索游戏中实现了超人的性能,同时设计比其他竞争力代理人要简单得多。
translated by 谷歌翻译
神经活动的意义和简化表示可以产生深入了解如何以及什么信息被神经回路内处理。然而,如果没有标签,也揭示了大脑和行为之间的联系的发现表示可以挑战。在这里,我们介绍了所谓的交换,VAE学习神经活动的解开表示一种新型的无监督的办法。我们的方法结合了特定实例的排列损失,试图最大限度地输入(大脑状态)的转变观点之间的代表性相似性的生成模型框架。这些转化(或增强)视图是通过掉出神经元和抖动样品中的时间,这直观地应导致网络维护既时间一致性和不变性用于表示神经状态的特定的神经元的表示创建的。通过对从数百个不同的灵长类动物大脑的神经元的模拟数据和神经录音的评价,我们表明,它是不可能建立的表示沿有关潜在维度解开神经的数据集与行为相联系。
translated by 谷歌翻译
通过最大化示例的不同转换“视图”之间的相似性来构建自我监督学习(SSL)构建表示的最先进的方法。然而,在用于创建视图的转换中没有足够的多样性,难以克服数据中的滋扰变量并构建丰富的表示。这激励了数据集本身来查找类似但不同的样本,以彼此的视图。在本文中,我们介绍了我自己的观点(MISOW),一种新的自我监督学习方法,在数据集中定义预测的不同目标。我们的方法背后的想法是主动挖掘观点,发现在网络的表示空间中的邻居中的样本,然后从一个样本的潜在表示,附近样本的表示。在展示计算机愿景中使用的基准测试中,我们突出了在神经科学的新应用中突出了这个想法的力量,其中SSL尚未应用。在测试多单元神经记录时,我们发现Myow在所有示例中表现出其他自我监督的方法(在某些情况下超过10%),并且经常超越监督的基线。通过MOSO,我们表明可以利用数据的多样性来构建丰富的观点,并在增强的新域中利用自我监督,其中包括有限或未知。
translated by 谷歌翻译
自我监督的学习提供了一个有希望的途径,消除了在图形上的代表学习中的昂贵标签信息的需求。然而,为了实现最先进的性能,方法通常需要大量的负例,并依赖于复杂的增强。这可能是昂贵的,特别是对于大图。为了解决这些挑战,我们介绍了引导的图形潜伏(BGRL) - 通过预测输入的替代增强来学习图表表示学习方法。 BGRL仅使用简单的增强,并减轻了对否定例子对比的需求,因此通过设计可扩展。 BGRL胜过或匹配现有的几种建立的基准,同时降低了内存成本的2-10倍。此外,我们表明,BGR1可以缩放到半监督方案中的数亿个节点的极大的图表 - 实现最先进的性能并改善监督基线,其中表示仅通过标签信息而塑造。特别是,我们的解决方案以BGRL为中心,将kdd杯2021的开放图基准的大规模挑战组成了一个获奖条目,在比所有先前可用的基准更大的级别的图形订单上,从而展示了我们方法的可扩展性和有效性。
translated by 谷歌翻译
We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches 74.3% top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and 79.6% with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub. 3 * Equal contribution; the order of first authors was randomly selected.
translated by 谷歌翻译
The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.
translated by 谷歌翻译
We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of O(where H is the time horizon, S the number of states, A the number of actions and T the number of time-steps. This result improves over the best previous known bound O(HS √ AT ) achieved by the UCRL2 algorithm of Jaksch et al. ( 2010). The key significance of our new results is that when T ≥ H 3 S 3 A and SA ≥ H, it leads to a regret of O( √ HSAT ) that matches the established lower bound of Ω( √ HSAT ) up to a logarithmic factor. Our analysis contains two key insights. We use careful application of concentration inequalities to the optimal value function as a whole, rather than to the transitions probabilities (to improve scaling in S), and we define Bernstein-based "exploration bonuses" that use the empirical variance of the estimated values at the next states (to improve scaling in H).
translated by 谷歌翻译
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译